Combining Visual and Acoustic Speech Signals with a Neural Network Improves Intelligibility

نویسندگان

Terrence J. Sejnowski

Ben P. Yuhas

Moise H. Goldstein

Robert E. Jenkins

چکیده

R.E. Jenkins The Applied Physics Laboratory The Johns Hopkins University Laurel, MD 20707 Acoustic speech recognition degrades in the presence of noise. Compensatory information is available from the visual speech signals around the speaker's mouth. Previous attempts at using these visual speech signals to improve automatic speech recognition systems have combined the acoustic and visual speech information at a symbolic level using heuristic rules. In this paper, we demonstrate an alternative approach to fusing the visual and acoustic speech information by training feedforward neural networks to map the visual signal onto the corresponding short-term spectral amplitude envelope (STSAE) of the acoustic signal. This information can be directly combined with the degraded acoustic STSAE. Significant improvements are demonstrated in vowel recognition from noise-degraded acoustic signals. These results are compared to the performance of humans, as well as other pattern matching and estimation algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Intelligibility in Persian Children with Down Syndrome

Objectives: One of the most effective methods to describe speech disorders is the measurement of speech intelligibility. The speech intelligibility indicates the extent of acoustic signals that correctly speaker produces and hearer receives. The purpose of this study was to investigate the speech intelligibility in the Persian children with Down syndrome, age range was 3 to 5 years, who had spo...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Combining Neural Network with Genetic Algorithm for prediction of S4 Parameter using GPS measurement

The ionospheric plasma bubbles cause unpredictable changes in the ionospheric electron density. These variations in the ionospheric layer can cause a phenomenon known as the ionospheric scintillation. Ionospheric scintillation could affect the phase and amplitude of the radio signals traveling through this medium. This phenomenon occurs frequently around the magnetic equator and in low latitu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1989

Combining Visual and Acoustic Speech Signals with a Neural Network Improves Intelligibility

نویسندگان

چکیده

منابع مشابه

Speech Intelligibility in Persian Children with Down Syndrome

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Combining Neural Network with Genetic Algorithm for prediction of S4 Parameter using GPS measurement

عنوان ژورنال:

اشتراک گذاری